Spell Checking in Spanish: The Case of Diacritic Accents

نویسندگان

  • Jordi Atserias Batalla
  • María Fuentes Fort
  • Rogelio Nazar
  • Irene Renau
چکیده

This article presents the problem of diacritic restoration (or diacritization) in the context of spell-checking, with the focus on an orthographically rich language such as Spanish. We argue that despite the large volume of work published on the topic of diacritization, currently available spell-checking tools have still not found a proper solution to the problem in those cases where both forms of a word are listed in the checker’s dictionary. This is the case, for instance, when a word form exists with and without diacritics, such as continuo ‘continuous’ and continuó ‘he/she/it continued’, or when different diacritics make other word distinctions, as in continúo ‘I continue’. We propose a very simple solution based on a word bigram model derived from correctly typed Spanish texts and evaluate the ability of this model to restore diacritics in artificial as well as real errors. The case of diacritics is only meant to be an example of the possible applications for this idea, yet we believe that the same method could be applied to other kinds of orthographic or even grammatical errors. Moreover, given that no explicit linguistic knowledge is required, the proposed model can be used with other languages provided that a large normative corpus is available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grepator: Accents & Case Mix for Thesaurus

There is a real need among researchers and students for pedagogical resources. In France, information retrieval techniques have been developed, for example in the Doc'CISMeF web site. As Pubmed, documents are indexed with (French) MeSH terms, one of the problems discovered, in quality studies, is the inadequacies between the user requests and the MeSH controlled vocabulary. Moreover, French (bu...

متن کامل

Building ancient Spanish dictionaries for spell-checking of DL texts

Being aware of the usefulness of spell-checkers on the correction of modern works, and lacking this facility for ancient texts, we decided to build dictionaries for ancient Spanish. This decision led to new problems and new questions. We have built a time-aware system of dictionaries that takes into account the temporal dynamics of language, to help solve the problem of ancient Spanish spell-ch...

متن کامل

Rule-Based Spanish Morphological Analyzer Built From Spell Checking Lexicon

Preprocessing tools for automated text analysis have become more widely available in major languages, but non-English tools are often still limited in their functionality. When working with Spanishlanguage text, researchers can easily find tools for tokenization and stemming, but may not have the means to extract more complex word features like verb tense or mood. Yet Spanish is a morphological...

متن کامل

A contrastive study of Catalan and Spanish declarative intonation: Focus on Majorcan dialects

The goal of the present paper is to identify some of the differences in the intonation of Catalan and Spanish as spoken in Majorca. The tonal features we investigated were: (1) utterance-final pitch accents in broad focus declaratives, and (2) local contrastive focus pitch accents. Previous research, mostly on related varieties, such as Central Catalan and Castilian Spanish, had indirectly sugg...

متن کامل

An extended spell checker for unknown words

Spell checking is considered a solved problem, but with the rapid development of the natural language processing the new results are slowly extending the means of spell checking towards grammar checking. In this article I review some of the spell checking error classes in a broader sense, the related problems, their state-of-the-art solutions and their different nature on different types of lan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012